Short-Term Demand Forecasting Method in Power Markets Based on the KSVM–TCN–GBRT

With the consumption of new energy and the variability of user activity, accurate and fast demand forecasting plays a crucial role in modern power markets. This paper considers the correlation between temperature, wind speed, and real-time electricity demand and proposes a novel method for forecasting short-term demand in the power market. Kernel Support Vector Machine is first used to classify real-time demand in combination with temperature and wind speed, and then the temporal convolutional network (TCN) is used to extract the temporal relationships and implied information of day-ahead demand. Finally, the Gradient Boosting Regression Tree is used to forecast daily and weekly real-time demand based on electrical, meteorological, and data characteristics. The validity of the method was verified using a dataset from the ISO-NE (New England Electricity Market). Comparative experiments with existing methods showed that the method could provide more accurate demand forecasting results.


Introduction
e inherent volatility and uncertainty of renewable energy sources, such as wind power and photovoltaics, may lead to large deviations between the bid output and the actual output when renewable energy sources participate in the market. Also, customers change their electricity load when they receive a price change or incentive signal from the supply side, taking into account their own production or consumption. Hence, timely and accurate information on the demand changes has imposed higher requirements on the accuracy of demand forecasting [1].
In current electricity forecasting, it is necessary to fully integrate economic, meteorological, and other multidimensional information; use advanced data-driven processing means; and deeply analyze the change patterns and laws of renewable energy and demand [2]. is creates the need to study demand response characteristics under multidimensional variables in the electricity market environment to improve forecasting accuracy [3]. For power system operation, short-term demand forecasts can predict demands from 1 hour to 168 hours in advance [4]. To obtain accurate frequency components were predicted using a long shortterm memory neural network (LSTM), and the components were finally combined to obtain the demand forecast. Han et al. [10] presented a short-term individual residential demand forecasting model based on a combination of deep learning and k-means clustering, which is capable of effectively extracting the similarity of residential demand and performing residential demand forecasting accurately at the individual level. It first makes full use of k-means clustering to extract the similarity among residential demand and then employs deep learning to extract complicated patterns of residential demand. Lv et al. [11] designed a LightGBMoptimized LSTM to realize short-term stock prices. To improve the demand prediction accuracy in the case of single sample data, Mei et al. [12] proposed a model based on multiscale temporal features for LSTM. First, wavelet decomposition decomposes historical data into stable components, trend demand, and periodic series such as the response peak-valley magnitude and duration, highlighting different time-scale features. Second, LSTM is used to achieve further extraction of time-series characteristics and data fitting. Finally, the model directly outputs predicted values for multiple moments. M. Hadi Amini et al. [13] proposed an autoregressive integrated moving average method for forecasting conventional electrical loads and electric vehicle parking charging demand. Kianoosh et al. [14] proposed to model the nonseasonal and seasonal cycles of load data using regression (AR) and moving average (MA) components, which have been used to forecast electricity demand at different time scales. Moreover, Zhang et al. [15] used Singular Spectrum Analysis (SSA) to preprocess the data and then used a support vector machine (SVM) optimized by the cuckoo search (CS) algorithm to model the resulting sequence with different prediction strategies. Salah et al. [16] used wrapper and embedding feature selection methods to select the optimal features and a genetic algorithm (GA) to find the optimal time lag and number of layers to optimize the predictive performance of the LSTM model, which was used to construct a short-to-medium-term cumulative load forecasting model. LU et al. [17] proposed a hybrid model short-term load forecasting method based on the convolutional neural network (CNN) and LSTM network. e CNN was first used to extract the feature vector, and the feature vector was constructed in a time-series sequence and used as the input data for the LSTM network. In [18], a convolutional long-and short-term memory network (Conv-LSTM) was proposed for electrical load data, and it achieved better accuracy than the traditional prediction algorithm based on linear regression. CHEN et al. [19] derived the respective predictions based on LSTM and LightGBM training. e optimal weighting combination method was used to determine the weighting coefficients and derive the prediction values of the combined model to improve the accuracy of load prediction. In [20], predictions were extrapolated by calculating correlations between potential variables and outputs and predicting the future consumption of high performance. In [21], a hybrid method combining empirical mode decomposition (EMD), particle swarm optimization (PSO), and a fuzzy inference system based on the adaptive network (ANFISs) for short-term load prediction of microgrids was proposed. In [22], sample entropy was used to identify the nonlinearity and uncertainty of the original time series, and the optimal mode of the original series and the optimal input form of the model were determined by the feature selection method. Finally, the least square SVM adjusted by the multiobjective sines and cosines optimization algorithm was used to predict the power demand sequence. In [23], the parameters of the LSTM were first optimized using the Sparrow Search Algorithm (SSA); then, the dataset was preprocessed, and finally, the processed data were used for residential load training and prediction. Li et al. [24] combined LSTM with quantile regression to generate multiple quantile results and introduced a combinatorial layer that considers the constraint relationships between quantile prediction values to ensure that the quantile prediction values are reasonable.
e TCN was proposed in 2018 to offer greater performance advantages over recurrent neural networks (RNNs) in temporal data processing tasks [25]. Following the motivation above, we propose a novel method based on Kernel Support Vector Machine (KSVM)-TCN-Gradient Boosting Regression Tree (GBRT) for improving the short-term demand forecasting accuracy of power markets. e contributions of this paper are as follows: (1) e KSVM is used to extract and train classification features for real-time electricity demand on historical data by the features of temperature and wind speed, and a numerical calculation method is used to automatically select the parameters of the KSVM to derive a classification sequence of real-time electricity demand for future days as the feature sequence. (2) A multivariable TCN method is used to capture the fluctuation trend of day-ahead demand in the dayahead market to predict the real-time demand series in the real-time market. (3) e TCN-GBRT method integrating time-domain processing, integrated learning, and parallel feature processing is proposed. TCN is able to extract features and temporal relationships owing to its residual network and convolutional structure, avoiding gradient disappearance and gradient explosion in deep learning. GBRT can combine multiple weak classifiers into a single strong classifier that takes the best of all the weak classifiers and achieves optimal performance.

Materials and Methods
where c is the sum of the distances from the support vector to the hyperplane, called the "margin." Finding the maximizing interval is equivalent to finding the minimum ||x|| 2 . ω � (ω 1 ; ω 2 ; . . .; ω d ) is the normal vector of the hyperplane, b is the displacement term, and ϕ(x) denotes the feature vector after mapping x. e dual problem is defined in the following equation: Here, κ(., .) is the kernel function given in e kernel function maps the data from the original space into a high-dimensional Hilbert space, where a more efficient classification hyperplane exists than in the original space.
Suppose that x (i) j j�1···Ni ⊂ R d is the set of training samples in class i, where N i represents the training samples in class i (i � 1, 2, . . . , L). e Gaussian Radial Basis Function (RBF) kernel is defined as in the following equation: where x, z ∈ R d , and σ ∈ R − 0 { } are the corresponding parameters.

TCN Model.
TCN is a one-dimensional full convolution network, which combines the structure of causal convolution, extended convolution, and the residual network. Causal convolution means that the output time is convolved only with the elements of the previous layer of time, which ensures that there will be no information leakage in the future. Dilated convolution is designed to capture a sufficiently long history of information, and the depth of the network model increases dramatically.
Define the one-dimensional sequence input X(x 1 , } is the function to the dilatative convolution, and the TCN convolution operation is defined as in the following equation: where Con is the convolution operation; d is the dilatative convolution parameter; s is the current number of sequences; k is Con Size; s-d·i is past directions; and a convolution with parameter d � 1, 2, 4 and size k � 3 is shown in Figure 1. As shown in Figure 2, the mezzanine mapping is within the residual connection, as shown in equation (6). e residual network fits several nonlinear layers between the input and output data. e more the features that are extracted, the closer the residual F(x) is to 0. Hence, when the network reaches an optimal structure, F(x) is pushed to 0 as the network layers deepen, leaving only the identity mapping x. is overcomes the problem of TCN degradation due to increasing network layers. Using the residuals, each order of derivative plus the constant term 1 is as in equations (6) and (7). e error can still be effectively backpropagated at this point, even if the derivatives zf/zx are small.  Computational Intelligence and Neuroscience where F(x) is a residual function; x is a constant.

GBRT Model.
e GBRT algorithm is able to compensate for the tendency of the cart algorithm to overfit small sample data or produce instability and low prediction accuracy. e algorithm is an iterative decision tree algorithm, which consists of three parts: cart, Gradient Boosting algorithm, and reduction idea. e basic idea of the algorithm is to use the Boosting method to iterate multiple weak learners with low prediction accuracy to form a strong learner with high prediction accuracy, that is, to reduce the residuals of the previous model by learning again so that the next generated model has a smaller error. e gradient iteration makes the combined model continuously improved, which is a kind of decision tree integrated learning algorithm designed to improve the model learning rate and prevent overfitting. at is, it does not fully trust each residual tree and uses gradual approximation to learn through multiple tree residuals.
For continuous data types, the loss function is the classical loss function in Boosting, that is, the sum of squared errors, which is calculated as shown in (4). After M-th iterations, the prediction is shown in equations (8) and (9).
where x is the input variables, y is the output variables, and i is the iteration count. e key to improving the prediction accuracy of the GBRT model is the calculation of the residuals, and this paper uses the method proposed by Friedman [26] to calculate the residuals based on the negative gradient of the loss function.
In this paper, the negative gradient of the loss function is used as an approximation to the residuals in the boosted tree algorithm. Hence, the i-th sample of the m-th round g im is calculated as shown in the following equation: f m (x) can be calculated from β m and h m , where β m denotes the optimal step for each iteration and is calculated as shown in (13). H m (x) is the decision tree created in the mth iteration. e prediction is shown in (11) and (12).

Short-Term Demand Forecasting Method Based on KSVM-TCN-GBRT.
is paper takes into account dayahead demand, real-time demand, date features, and meteorological features. An RBF kernel-SVM method is used to get the relationship between temperature, wind speed, and real-time power demand. e TCN-GBRT method is also proposed to forecast the next day's real-time demand and the next week's real-time demand. Processing datasets are shown in Table 1.  Computational Intelligence and Neuroscience e KSVM-TCN-GBRT method processing steps for power markets are shown in Figure 3. e short-term demand forecasting framework based on the KSVM-TCN-GBRT method is shown in Figure 4.
In this paper, parameters are selected automatically for RBFkernel-SVM by the means of temperature and wind speed. Temperature and wind speed are the main influencing factors in the new energy power market, including photovoltaic and wind power. In this paper, temperature and wind speed are proposed for measuring real-time demand class dissociative in the feature space. e same classes' temperatures and wind speeds are as close as possible. For the different classes' temperatures and wind speeds, the greater the distance that can be  Computational Intelligence and Neuroscience created between them, the better. Hence, the mean of values applied by the normal kernel function on the samples in the same class is as shown in equation (13): where |Ω i | is the samples in class i. β is w(c), which is close to 1 [29]. e RBF kernel function is as in the following equation:    It can be seen that 0 ≤ w(c) ≤ 1 and 0 ≤ b(c) ≤ 1 if k(x, y, c) ≥ t, which is equivalent to the optimization problem, as in the following equation: (15) In this paper, the KSVM-TCN-GBRT parameter settings are shown in Table 2.
Consider the impact of meteorological factors such as temperature and wind on new energy sources such as photovoltaics and wind power. In this paper, hourly  datasets are extracted to create quarterly and similar day datasets based on actual electricity spot market data and weather data. is paper classifies real-time electricity demand by numerical intervals, using wind speed and temperature as the main characteristics and using 100 as the unit of measurement. A classification method using RBF SVMs with hyperparameters is used to find the optimum using numerical optimization. e wind speed and temperature are divided into groups that are particularly close to each other; the greater the difference between the different groups, the better.
is master feature classification sequence is fed as a feature column into the succeeding deep learning neural network. KSVM can effectively solve machine learning problems with small samples and has good generalization ability; it can compensate for the problems of neural network structure selection and local minima.
In the electricity market, a real-time electricity demand forecast is a multivariate time series consisting of the dayahead demand and day-ahead price. In this paper, a multivariate TCN model is developed for supervised learning, and the dynamic relationships between its variables are extracted. We take the T time series of the day-ahead demand and day-ahead price of the day-ahead market as the cause and the time series of real-time demand series T + 24 and T + 168 of the real-time market as the effect. By increasing the number of layers, changing the expansion coefficients, revising the filters, and adjusting the length of the historical sequence, we can avoid the gradient dispersion and gradient explosion problems in the RNN model prediction, and longer-term memory and dynamic analysis capabilities can be obtained. Also, as a convolutional structure, TCNs can slide a one-dimensional convolutional kernel to receive inputs of an arbitrary length and can be massively processed in parallel for faster training and verification. is effectively guarantees the timeliness of power prediction.
In this paper, one hot numerical treatment of meteorological features and date features is used to process the electricity market dataset into a non-high-dimensional nonsparse set of values suitable for GBRT forecasting. GBRT is integrated learning, using decision trees as weak classifiers and iterative learning based on the residuals of the decision tree predictions. It allows the GBRT model to be highly interpretative and robust, automatically discovering higher-order relationships between day-ahead market sequences and real-time demand characteristics.

Performance Evaluation.
In this paper, absolute error (APE) is shown in equation (15), mean absolute percentage error (MAPE) is shown in equation (16), root mean square error (RMSE) is shown in equation (17), and mean absolute error (MAE) is shown in equation (18). e prediction errors and definitions are as follows: where N is the total number of test datasets, P pre i is the i-th demand prediction, and P real i is the i-th demand prediction.  Figure 5. e four series of forecast dates are represented in order from top to bottom: original series, trend series, seasonal series, and residual series. e maximum real-time demand at different temperatures is shown in Figure 6. e maximum real-time demand at different wind speeds is shown in Figure 7.

Results and Discussion
In this paper, the comparison method parameter settings are shown in Table 3.  Table 4. e result of the comparison method is shown in Table 5. Figure 8 shows the  results of the KSVM-based real-time demand classification, considering the forecast results for temperature and wind speed for March 31, 2021. Figure 9 shows the comparison between the predicted and actual demand power for each model from March 30, 2021. It can be seen that the proposed models were able to match the actual demand power in their forecasts. In particular, the proposed models could capture occasional fluctuations.  Table 6. Figure 10 shows the results of the KSVM-based real-time demand classification, considering the forecasted temperature and wind speed from March 25, 2021, to March 31, 2021. Figure 11 shows the comparison between the predicted and actual demand power for each model from March 31, 2021. It also can be seen that the proposed models were able to match the actual demand power better than other forecasts.

Conclusions
is paper proposed a novel method for short-term demand forecasting in power markets based on KSVM-TCN-GBRT. e advantages of this method over previous methods are as follows: (1) A data-driven method for short-term demand forecasting based on KSVM-TCN-GBRT was designed and improved, and the temperature and the wind speed were proposed for measuring the realtime demand to improve accuracy in forecasting market demand. (2) We adopted a model structure consisting of data classification, a time-convolutional network, and an integrated forecasting model for daily and weekly forecasts. Our proposed model can do multistep forecasting and improve the accuracy by focusing on each feature differently. (3) CNN-LSTM, LSTM with the attention mechanism, bidirectional LSTM, and TCN were used for forecasting and comparative analysis, and the operational results indicated that the proposed prediction method can reduce the prediction error and improve the prediction accuracy.
Data Availability e data of the models and algorithms used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.